Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Sample Collection (Operator Toolbox)

Synopsis

A collection is a list of items. This operator allows you to take a collection and sample it to a given sample size.

Description

The operator provides 3 different sampling methods (see parameter description) to perform the sampling. The parameter sample_size describes the number of items in the sampled output collection.

Input

  • exa (Collection)

    The collection which should be sampled.

Output

  • col (Collection)

    The sampled collection.

  • org (Collection)

    The original collection.

Parameters

  • sampling_method The method to use for sampling.
    • linear sampling: Take the first n objects of the collection.
    • shuffled sampling: Take n unique, but random objects of the collection.
    • bootstrap sampling: Take n random objects of the collection. Objects are allowed to be taken several times.
    Range:
  • sample_size The number of objects to be drawn. Range:
  • use_local_random_seed This parameter indicates if a local random seed should be used. Range:
  • local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:

Tutorial Processes

Grouping an ExampleSet into a collection

In this process we group the Titanic data set into bins of passenger fare. Then we select 2 random price ranges.